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We develop a statistical theory of networks. A network is a set of vertices and links given by its 
adjacency matrix c, and the relevant statistical ensembles are defined in terms of a partition function 
Z — ~}2 c exp [— /37i(c)]. The simplest cases are uncorrelated random networks such as the well-known 
Erdos-Reny graphs. Here we study more general interactions Ti(c) which lead to correlations, for 
example, between the connectivities of adjacent vertices. In particular, such correlations occur in 
optimized networks described by partition functions in the limit (3 — > oo. They are argued to be a 
crucial signature of evolutionary design in biological networks. 

PACS numbers: 89.75.Hc 89.75.-k 05.20.y 



Networks describe structures as diverse as the inter- 
action links between proteins in a cell, the wiring of the 
brain, or the connections of the internet. Recent theoret- 
ical work [1,2] has focused on communication networks, 
and a wealth of quantitative data is now becoming avail- 
able on networks in molecular biology. Examples include 
control networks in gene transcription [3] , the interaction 
map of proteins [4] , and the pathways of cell metabolism 
[5] . All these systems consist of many different kinds of 
molecules linked by complex interactions. Network mod- 
els are a simplified description, which neglects quantita- 
tive aspects of these interactions and focuses solely on 
their pathways. 

A network is a set of vertices i = 1, . . . , N connected 
by links. It is uniquely defined by the adjacency matrix 
c, whose entries are Cjj = 1 if there is a link from i to j 
and Cij = otherwise. We consider here networks with 
undirected links, where c is symmetric. The connectivity 
or degree of a vertex is then defined as the total number 
of links connected to it, fcj = Y] j eg . The distance dij 
between two vertices i and j is the number of links along 
the shortest path connecting them [6]. We assume the 
vertices are labeled (for example, by their biochemical 
identity), so that the correspondence between adjacency 
matrix c and its graph is one-to-one [7] . 

Networks with an irregular wiring naturally lend them- 
selves to a statistical description [6]. We discuss here 
the equilibrium statistics of networks. The partition 
function Z can be defined as a sum over all graphs 
with a fixed number N of vertices and a fixed number 
= E«,-c« =Trc 2 /2 of links 



Z = Y[ ^ 5{M - Trc 2 /2) exp[-/3W(c)] 



(1) 



i<3 c ij=0 



Averages over this ensemble are denoted by (...). Al- 
ternatively one can define Z with an arbitrary number 
of links adjusted by a suitable chemical potential. The 



ensembles of relevance here have a finite average con- 
nectivity k = 2M/N. For fixed k, the distribution of 
connectivities, 



1 N 



(2) 



becomes asymptotically independent of N, implying that 
typical adjacency matrices c become sparse for large N. 
This is the case of interest for applications. 

A satisfactory mathematical theory exists to date only 
for what we call uncorrelated random networks [8,9]. In 
this case, the Hamiltonian depends only on single-point 
connectivities, 
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wi(c)=x;/(*i) 



(3) 



i=l 



leading to p{k) ~ exp[—/3/(fc)—/ifc]/fc! where the constant 
of proportionality is fixed by normalization and /x is ad- 
justed to give the correct average connectivity. Since all 
graphs with the same connectivities fci, . . . , fcjv have the 
same statistical weight, this ensemble ensures the max- 
imally random wiring compatible with the distribution 
p(k). The simplest example is the well-known Erdos- 
Reny graphs, where f37i = and p(k) is a Poissonian. 

Many natural networks are, however, not of this type. 
The simplest kind of correlations occur if the joint dis- 
tribution of connectivities for neighboring vertices, 

1 N 

q(k, k') = —J2 (S(ki - k)c %3 5(k 3 - k')) , (4) 



differs from its form for uncorrelated random networks, 
qo(k,k') = (kk' j K 2 )p(k)p(k'). Higher correlations can be 
defined in a similar way [10]. Connectivity correlations 
have been found in growth models of communication net- 
works [11,12] as well as in data of genetic and protein 
networks [13,14]. 
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These observations call for a statistical theory of more 
general ensembles called correlated random networks, 
which is the subject of this Letter. The ensembles of 
interest are characterized by finite distributions p(k) and 
q(k, k') in the limit of large N. One then expects a uni- 
versal logarithmic scaling of the average distance dij be- 
tween vertices, Ei j(dij) /Nq ~ log An, in any connected 
component £1 with Nq nodes [15]. This is consistent 
with our numerical findings. Hence, correlated random 
networks maintain a sparse connectivity matrix and are 
locally tree-like. The 'inverse temperature' [3 in (1) mea- 
sures the deviation from Erdos-Reny graphs. Quite re- 
markably, these structural properties are preserved in the 
limit (3 — > oo, where we obtain nontrivial optimized net- 
works. Ensembles of this kind generically have strong 
correlations. 

The simplest type of Hamiltonian producing correla- 
tions has nearest-neighbor connectivity interactions, 



W 2 (c) = ^c ij g(k i 



kj) , 



(5) 



i<3 



where g(k, k') is some function of the connectivities. 
The resulting class of graph ensembles can be seen as 
a showcase for correlated random networks where ana- 
lytic expressions can be derived. Higher order correla- 
tions are generated by Hamiltonians with next-nearest 
neighbor interactions etc. We also study a Hamiltonian 
H = Hi + \TLd with a nonlocal part, 



N 



(6) 



i<j 



often called the diameter of the graph. For A > and 
a suitable scaling A <~ l/(N\ogN), this Hamiltonian is 
found to generate compactified networks with finite p{k) 
and q(k,k'), provided the extra term Hi stabilizes the 
network against collapse to a star. 




FIG. 1. Optimized networks, obtained from local interac- 
tions (left) and nonlocal interactions for large /3. Hubs of 
high connectivity (filled circles) are preferentially connected 
to peripheral vertices of low connectivity (empty circles) . 

Compactified networks occur in communication and 
transport [16], and may play a role in biology [17,18]. For 



example, distance-optimized networks obtained from (6) 
show an abundance of high-connectivity vertices (hubs) 
which are preferentially connected to peripheral vertices 
of low connectivity. A very similar structure (with even 
more pronounced hubs) can also be obtained from local 
interactions of the form (5). Typical graphs are shown 
in Fig 1. 

To establish these results, we first discuss the analyt- 
ical treatment of local interactions, choosing a generic 
Hamiltonian of the form H = Hi + H2 given by (3) 
and (5). From the partition function (1), the free en- 
ergy is derived by using an integral representation for 
the constraints due to the connectivity ki at each ver- 
tex and making a Hubbard-Stratonovich transformation. 
The resulting integral can be evaluated in a saddle-point 
approximation, yielding the reduced free energy per ver- 
tex in the thermodynamic limit, 



-0f = lim — \ogZ = £ log A + £ (logK - 1) 

N—>oo IV Z Z 




(7) 



+ log 
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k\ 



-Pf(k)] 



The 'order parameters' Q k and the chemical potential fi 
have to be determined self-consistently from the saddle- 
point condition. This form is closely related to a field- 
theoretic approach [9,19,20], where networks appear as 
the Feynman diagrams of a Gaussian integral with a 
propagator matrix Qkk 1 = exp[— fig(k, k')] — 1 and in- 
teractions as specified by the last term in (7). Notice the 
super-extensive scaling of the entropy, (k/2) log N, which 
reflects that unlike in a regular lattice, each vertex can 
be connected to all N — 1 other vertices. The last term 
in (7) is directly related to the degree distribution, 



P(k) = C 



(Qk 



k\ 



exp [-(3f(k)] 



(8) 



where C is a normalization constant. For example, a 
power-law tail in p(k) may be generated by a suitable 
choice of the weights f(k) but it is not generic. 

A more detailed account of networks with generic lo- 
cal interactions will be published elsewhere [21]. Here 
we turn to the simplest Hamiltonian with local inter- 
actions producing nontrivial optimized networks, see 
Fig. 1. It has the form H L = Hi + H 2 with Hi{c) = 
-(1/2) kf + r?£ 4 kf and H 2 (c) = CE,<, S k ^S kjA . 

The first term (1/2)]^^ = (V 2 ) Y,ijk c ik c kj g ive s the 
number of paths of length two on the graph. It rewards 
the formation of hubs, i.e. highly connected vertices, 
which in turn lead to short distances. In fact this term 
has the maximally compact, starlike configuration as its 
ground state. The collapse to a star, where the connec- 
tivity of the central vertex scales with the size of the 
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graph, however, is prevented by the regularization term 
r\ ^\ kf . The correlation term Hi with £ — > 00 sup- 
presses single, isolated links connecting two vertices of 
connectivity 1. Without this term an extensive number 
of isolated links remains even in the limit j3 — > 00 leading 
to graphs with a large number of disconnected parts. A 
minimal connectivity of 1 of each vertex is enforced. For 
this Hamiltonian, the free energy (7) contains only one 
non-zero order parameter Qi given by p(l) — W q i _^_ <t . 
The chemical potential /x is determined by ^ k kp{k) = k. 
Remarkably, the connectivity correlation can be obtained 
from Ti.2 and single-vertex quantities, 



q(k, k') ~ kp(k)t k k'p(k')t k , exp [-0g(k, k')] 



(9) 



The constant of proportionality is fixed by normalization 
and the t k are determined by the marginal distribution 
Efe' <l( k ' k ') = kp(k)/n, giving ti = k/ (T(k - p(l))) and 
t k = T = ^l-p{\y/{K-p{l)Y for k > 1. 

The properties of optimized networks resulting from 
the Hamiltonian Hl are readily inferred from Eqs. (8) 
and (9). At finite values (3 one finds that the degree 
distribution (8) has an exponentially decaying tail. In 
order to analyze the limit f3 — ► 00, we replace the sum 
over k in (7) by an integral. One finds that the vertices 
arrange themselves into hubs of connectivity 



k* = (1 - 2r])/4r] 



(10) 



and peripheral vertices of connectivity 1. The peripheral 
vertices arc connected only to hubs, while the hubs form 
an uncorrelated random network. 

A remarkably similar structure is found for compact- 
ified networks generated by the Hamiltonian with non- 
local interactions H = Hi + XHd with Hd given by (6) 
and Hi — vJ2i kf- For A > 0, the nonlocal part Hd fa- 
vors networks with short distances, while Hi prevents 
the collapse to a star as before. Hence, by choosing 
A = 2/(N\ogN), one obtains a well-defined thermo- 
dynamic limit with the average distance between ver- 
tices scaling as logiV. We have studied this ensem- 
ble, as well as the case of local interactions Hl, by 
a Monte-Carlo link dynamics. Starting, for example, 
from an Erdos-Reny graph, randomly chosen links are 
moved to previously unlinked vertex pairs with proba- 
bility p = min(l, exp[— /3AH]) where AH denotes the 
corresponding change in the Hamiltonian. The mini- 
mum degree of 1 is enforced throughout, self- links are 
excluded [7]. No dependence on the initial conditions 
has been found. For the local Hamiltonian we use a net- 
work with N = 200, k = 2.4, 77 = 0.03, for the nonlocal 
Hamiltonian we use N = 100, k = 2.4, 77 = 0.001. We 
averaged over 100 samples. Fig. 2 juxtaposes analytical 
and numerical results for local interactions on the left 
with numerical results for nonlocal interactions on the 
right. The connectivity distribution p(k) shows the for- 
mation of high-connectivity hubs in both cases. 
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FIG. 2. Statistical features of correlated random networks 
with local interactions (left) and nonlocal interactions (right) . 

(a) Single-point connectivity distribution p(k) for various val- 
ues of p. The Poisson form (dotted line) , data for intermedi- 
ate f3 (open circles) and large /3 (filled squares), and analyt- 
ical values (8) for the case of local interactions (solid lines). 

(b) Neighbor connectivity distribution q(k,k'). Left: Local 
interactions, analytical form (9) for f3 = 3. Right: Nonlo- 
cal interactions, numerical results for /3 = 15. (c) Average 
relative entropy (S(q\qo)) (circles), compared to the average 
sampling entropy (So(qo)) (squares) and its standard devia- 
tion (ASo(qo)), see text, (d) Average inverse distance if as a 
function of j3 (open circles), compared to the same quantity 
for the equivalent uncorrelated network (open squares) and for 
the equivalent locally correlated network (right figure, open 
diamonds); see text. 



3 



For local interactions the hub connectivity is given by 
(10), for non-local interaction the distribution remains 
broad even in the limit [3 — > oo (Fig. 2(a)) [22]. Low- 
connectivity vertices are preferentially attached to hubs 
as indicated by the peaks at q(l,k*) for local interac- 
tions and the corresponding peaks of q(k, k') for nonlo- 
cal interactions (Fig. 2(b)). The deviation from uncor- 
related random networks is measured by the relative en- 
tropy S(q\q ) = J2k,k> v( k > k ') lo § q%V) • In a sm9le nct_ 

work of size N, we obtain the estimate S(q\qo) from the 
observed frequencies q(k,k') and p(k), with q (k,k') = 
(kk' /K 2 )p(k)p(k'). We then generate a sufficient num- 
ber of uncorrelated random networks [24] with frequen- 
cies q a (k,k') and sampling entropies S(q a \qo); their av- 
erage and standard deviation are denoted by So(<7o) and 
ASo(go) respectively. Connectivity correlations in the 
original network are significant if S(q\qo) — So(qo) > 
ASo{qo)- This is typically the case above a certain opti- 
mization degree (3 on, as shown by the ensemble averages 
over 10 samples {S(q\qo)) 7 (So(qo)), and (ASo(qo)) shown 
in figure 2c). 

Both kinds of networks become more compactified with 
increasing (3, as shown by the average inverse distance 
K ee (2/N(N- l))^^- 1 (Fig. 2(d)) [23]. We also 
plot K for the equivalent uncorrelated random networks; 
no such compactification is seen. Hence, the one-point 
distribution p(k) may miss important functional prop- 
erties. On the other hand, the nonlocally interacting 
networks and their equivalent locally interacting coun- 
terparts (constructed to have the same p(k) and q(k, k')) 
have a very similar degree of compactification [24]. This 
illustrates how optimization induces correlations. 

In summary, we have shown how interactions shape 
the structure of a network. Hamiltonians beyond the 
'single-vertex' form (3) generate correlations such as a 
neighbor connectivity distribution q(k, k') which differs 
from that in uncorrelated networks. Higher correlations 
can be defined in a similar way [10]. These correlations 
provide a more detailed fingerprint of the interactions 
present than the single-point connectivity distribution 
p{k). This observation should carry over to the dynami- 
cal rules for non-equilibrium ensembles such as the well- 
known growth models [1,11]. 

In transcription control networks, structural motifs 
have been identified that can be expressed in terms of 
connectivity correlations [13]. Such correlations have also 
been observed in protein networks [14]. In view of our 
findings for optimized networks, they appear to be a nat- 
ural consequence of the underlying dynamics and func- 
tional optimization. We expect the data to give impor- 
tant information on the underlying design principles of 
networks and on the selective forces governing their evo- 
lution. Reverse engineering seems feasible, with the aim 
of inferring the relevant dynamics from the data. The 
nonequilibrium theory of correlated random networks will 



thus be an important avenue for future research. 
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k, I at random with uniform probability and rewire them 
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